Corpus-Oriented Grammar Development for Acquiring a Head-Driven Phrase Structure Grammar from the Penn Treebank

نویسندگان

  • Yusuke Miyao
  • Takashi Ninomiya
  • Jun'ichi Tsujii
چکیده

This paper describes a method of semi-automatically acquiring an English HPSG grammar from the Penn Treebank. First, heuristic rules are employed to annotate the treebank with partially-specified derivation trees. Lexical entries are automatically extracted from the annotated corpus by inversely applying schemata to partially-specified derivation trees.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus-Oriented Development of Japanese HPSG Parsers

This paper reports the corpus-oriented development of a wide-coverage Japanese HPSG parser. We first created an HPSG treebank from the EDR corpus by using heuristic conversion rules, and then extracted lexical entries from the treebank. The grammar developed using this method attained wide coverage that could hardly be obtained by conventional manual development. We also trained a statistical p...

متن کامل

From Linguistic Theory to Syntactic Analysis: Corpus-Oriented Grammar Development and Feature Forest Model

The goal of this thesis is to establish a system for the automatic syntactic analysis of real-world text. Syntactic analysis in this thesis denotes computation of in-depth syntactic structures that are grounded in syntactic theories like Head-Driven Phrase Structure Grammar (HPSG). Since syntactic structures provide essential components for computing meanings of natural language sentences, the ...

متن کامل

Parse disambiguation for a rich HPSG grammar

The fine-grained nature of the HPSG representations found in the Redwoods treebank raises novel issues in parse disambiguation relative to more traditional treebanks such as the Penn treebank, which have been the focus of most past work on probabilistic parsing (e.g., Charniak 1997; Collins 1997). The Redwoods treebank is much richer in the representations it makes available. Most similar to Pe...

متن کامل

Acquiring an Ontology for a Fundamental Vocabulary

In this paper we describe the extraction of thesaurus information from parsed dictionary definition sentences. The main data for our experiments comes from Lexeed, a Japanese semantic dictionary, and the Hinoki treebank built on it. The dictionary is parsed using a head-driven phrase structure grammar of Japanese. Knowledge is extracted from the semantic representation (Minimal Recursion Semant...

متن کامل

Towards an LFG parser for Polish: An exercise in parasitic grammar development

While it is possible to build a formal grammar manually from scratch or, going to another extreme, to derive it automatically from a treebank, the development of the LFG grammar of Polish presented in this paper is different from both of these methods as it relies on extensive reuse of existing language resources for Polish. LFG grammars minimally provide two levels of representation: constitue...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004